Understanding approximate Fisher information for fast convergence of natural gradient descent in wide neural networks*
نویسندگان
چکیده
Abstract Natural gradient descent (NGD) helps to accelerate the convergence of dynamics, but it requires approximations in large-scale deep neural networks because its high computational cost. Empirical studies have confirmed that some NGD methods with approximate Fisher information converge sufficiently fast practice. Nevertheless, remains unclear from theoretical perspective why and under what conditions such heuristic work well. In this work, we reveal that, specific conditions, achieves same global minima as exact NGD. We consider infinite-width limit, analyze asymptotic training dynamics function space via tangent kernel. space, are identical those information, they quickly. The holds layer-wise approximations; for instance, block diagonal approximation where each corresponds a layer well tri-diagonal K-FAC approximations. also find unit-wise assumptions. All these different an isotropic plays fundamental role achieving properties training. Thus, current study gives novel unified foundation which understand learning.
منابع مشابه
Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns
The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...
متن کاملNatural Gradient Descent for Training Stochastic Complex-Valued Neural Networks
In this paper, the natural gradient descent method for the multilayer stochastic complex-valued neural networks is considered, and the natural gradient is given for a single stochastic complex-valued neuron as an example. Since the space of the learnable parameters of stochastic complex-valued neural networks is not the Euclidean space but a curved manifold, the complex-valued natural gradient ...
متن کاملGradient Descent for Spiking Neural Networks
Much of studies on neural computation are based on network models of static neurons that produce analog output, despite the fact that information processing in the brain is predominantly carried out by dynamic neurons that produce discrete pulses called spikes. Research in spike-based computation has been impeded by the lack of efficient supervised learning algorithm for spiking networks. Here,...
متن کاملCuriously Fast Convergence of some Stochastic Gradient Descent Algorithms
1 Context Given a finite set of m examples z 1 ,. .. , z m and a strictly convex differen-tiable loss function ℓ(z, θ) defined on a parameter vector θ ∈ R d , we are interested in minimizing the cost function min θ C(θ) = 1 m m i=1 ℓ(z i , θ). One way to perform such a minimization is to use a stochastic gradient algorithm. Starting from some initial value θ[1], iteration t consists in picking ...
متن کاملConvergence of Gradient Descent Algorithm with Penalty Term for Recurrent Neural Networks
This paper investigates a gradient descent algorithm with penalty for a recurrent neural network. The penalty we considered here is a term proportional to the norm of the weights. Its primary roles in the methods are to control the magnitude of the weights. After proving that all of the weights are automatically bounded during the iteration process, we also present some deterministic convergenc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Statistical Mechanics: Theory and Experiment
سال: 2021
ISSN: ['1742-5468']
DOI: https://doi.org/10.1088/1742-5468/ac3ae3